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Abstract 

Genome-wide association studies (GWASs) may help to understand the effects of genetic polymorphisms on breast cancer 
(BC) progression and survival. However, they give only a focused view, which cannot capture the tremendous complexity of 
this disease. Therefore, we investigated data from a previously conducted GWAS on BC survival for enriched pathways by 
different enrichment analysis tools using the two main annotation databases Gene Ontology (GO) and Kyoto Encyclopedia 
of Genes and Genomes (KEGG). The goal was to identify the functional categories (GO terms and KEGG pathways) that are 
consistently overrepresented in a statistically significant way in the list of genes generated from the single nucleotide 
polymorphism (SNP) data. The SNPs with allelic p-value cut-offs 0.005 and 0.01 were annotated to the genes by excluding or 
including a 20 kb up-and down-stream sequence of the genes and analyzed by six different tools. We identified eleven 
consistently enriched categories, the most significant ones relating to cell adhesion and calcium ion binding. Moreover, we 
investigated the similarity between our GWAS and the enrichment analyses of twelve published gene expression signatures 
for breast cancer prognosis. Five of them were commonly used and commercially available, five were based on different 
aspects of metastasis formation and two were developed from meta-analyses of published prognostic signatures. This 
comparison revealed similarities between our GWAS data and the general and the specific brain metastasis gene signatures 
as well as the Oncotype DX signature. As metastasis formation is a strong indicator of a patient's prognosis, this result 
reflects the survival aspect of the conducted GWAS and supports cell adhesion and calcium signaling as important pathways 
in cancer progression. 
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Introduction 

Worldwide, breast cancer (BC) is the most common cancer 
among women, comprising 23% of all female cancer. Each year, 
about 1.4 million new cases are diagnosed and about 460,000 
women die of this disease [1]. It has been shown that survival of 
BC is in part heritable which can possibly be explained by yet 
unknown genetic factors [2]. Further knowledge about the effects 
of inherited genetic variation on BC survival can help to predict 
the patient's individual risk for disease progression and survival 
probabilities and to develop new and better therapies and 



prevention strategies. A genome-wide association study (GWAS) 
is a powerful tool to search for a genetic influence on complex 
traits. Within the last six years 34 GWASs on breast cancer have 
been performed identifying 194 new susceptibility loci (http:// 
www.genome.gov/gwastudies). Also three GWASs on breast 
cancer survival have been conducted leading only to three 
prognostic loci [3-5]. Therefore, a more global view on GWAS 
data can reveal new insights in cancer formation and progression 
and give new clues for further investigations. 

A good tool to set high-throughput data into a global context is 
a pathway enrichment analysis [6]. The gene-group-based 
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SNPs of the GWAS on BC survival 

/allelic SNP p-value cut-offs (table 1 ) V, 
0.0001,0.001, 0.005, 0.01 and 0.05 j| 

gene lists based on SNPs gene lists based on SNPs located within 

located in a gene region a gene ±20 kb surrounding region 

\ * 

two gene lists with allelic p-value cut-offs of 0.01 two gene lists with allelic p-value cut-offs of 0.01 

and 0.005 as input for six enrichment analysis tools and 0.005 as input for six enrichment analysis tools 

I * 

lists of overrepresented categories (table 2) lists of overrepresented categories (table 2) 

comparison of the overrepresented categories 
0.005 gene list vs. 0.005 ±20 kb gene list 
0.01 gene list vs. 0.01 ±20 kb gene list 

\ 

overlapping categories of the 0.005 gene lists and the 0.01 gene lists 

\ 

overlapping categories obtained by the different tools 

\ 

consistently enriched categories (table 3) 



Figure 1. Flow chart of the pathway enrichment analysis of the GWAS on BC survival. 

doi:1 0.1 371 /journal.pone.0098229.g001 



approach increases the likelihood to identify the biological 
processes which are overrepresented in the high-throughput data 
and have a high impact on the studied disease. The most 
commonly used gene annotation databases are Gene Ontology 
(GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG), 
in which the biological knowledge about genes and their associated 
processes and pathways are collected. This knowledge can be used 
by pathway enrichment tools, which map the genes of the 
investigated list to the associated biological annotation terms of the 
database. Then, the customized enrichment result is compared to 
the control background and an enrichment /j-value is calculated 
and corrected for multiple testing. Currently, a huge variety of 
different pathway enrichment tools are available. 

Some of the tools input lists of genes or proteins and output 
enriched pathways. Others take the locations of single nucleotide 
polymorphisms (SNPs) into consideration, and thus gene lists can 
be derived from GWAS data. The aim of our study was to submit 
a GWAS on BC survival to a pathway enrichment analysis. In the 
GWAS, the genotype data of women of Western European origin 
with long and short time survival after the diagnosis of BC were 



compared. Pathway enrichment analysis was conducted using six 
different enrichment tools on four final gene lists. The gene lists 
based on our SNP data with allelic /J-value cut-offs 0.005 and 0.01 
and with a gene annotation by excluding or including a 20 kb up- 
and down-stream sequence of the gene. Only those categories 
which were enriched in all four lists and more than one tool were 
considered to be consistently enriched. We were also interested 
whether our results are supported by gene signatures on breast 
cancer prognosis derived from gene expression profiling studies. 
Therefore, we performed pathway enrichment analyses with 
several commonly used prognostic gene signatures and compared 
the results with our GWAS data. 

Materials and Methods 

Ethics Statement 

All participants in the GWAS gave written informed consent to 
the use of their samples for research purpose. The study was 
approved by the ethical committee of each participating institute. 



Table 1. Number of SNPs and genes corresponding to allelic p-value cut-offs of the GWAS on BC survival. 





p-value 


total No. of SNPs 


No. of SNPs within a gene 


No. of genes 


No. of SNPs ±20 kb 


No. of genes ±20 kb 


<0.05 


4572 


1664 


1015 


2525 


1576 


-C0.01 


3080 


1137 


737 


1725 


1143 


<0.005 


1607 


576 


402 


746 


638 


<0.001 


329 


112 


83 


163 


125 


-C0.0001 


40 


9 


9 


12 


10 
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GWAS 

The GWAS on BC survival was a population based case-only 
study, in which the BC patients were divided in two groups based 
on their survival time. We considered as cases 369 women with 
short-time survival (less than 6 years after breast cancer diagnosis) 
which were compared with a group consisting of 369 women with 
long-time survival (S 1 1 years after breast cancer diagnosis) as 
controls. Details of the characteristics of the study population are 
found in the table SI. The cases and controls were selected from 
four cohorts and matched for age at diagnosis (<40, 40-49, 50-59 
and >60 years), gender, diagnosis period (1985-1989, 1990-1994 
and 1995-) and cohort (table S2). Blood samples were prospec- 
tively collected in each cohort. The cases and controls were 
identified from the cohorts by record linkage to the regional cancer 
registries. Follow-up was performed until December, 31st, 2007 
and the data were available for every patient. The Vasterbotten 
intervention project (VIP), the mammary screening project (MSP) 
and the Department of Oncology, Norrlands University Hospital, 
Umea, Sweden, contributed with 96 cases and 96 controls [7]. 
Within VIP, blood samples have been collected since 1985, within 
MSP since 1995, with subsequent BC diagnosis during the years 
1988-2005. Norrlands University Hospital collects blood samples 
consecutively since 1990 from newly diagnosed BC patients and 
43 BC patients, not included in VIP or MSP, were included in the 
study. The Malmo Diet and Cancer Study, Malmo, Sweden 
contributed 44 cases and 44 controls [8,9]. Blood samples were 
collected between 1991 and 1996, prior to BC diagnosis between 
1991 and 2005. The third sample set comprised 82 cases and 14 
controls from the Stadtisches Klinikum Karlsruhe and Deutsches 
Krebsforschungszentrum Breast Cancer Study (SKKDKFZS) and 
68 controls from the Umea cohort. The SKKDKFZS consists of 
women between 21-93 years of age at diagnosis with patholog- 
ically confirmed breast cancer recruited at the Stadtisches 
Klinikum Karlruhe, Karlsruhe, Germany from 1993-2005 and 
a blood sample collected at the time of diagnosis. The Icelandic 
Cancer Society and University of Iceland Biobank contributed 
with 147 cases and 147 controls with BC diagnosis during the 
years 1983-2004 [10]. 

A genome-wide scan of ~ 300,000 tagging SNPs was conducted 
using the Illumina HumanCytoSNP- 1 2 vl according to the 
manufacturer's protocols. Before analysis, markers with one or 
more of the following criteria were excluded: <90% genotype call 
rate, minor allele frequency <5% or Hardy- Weinberg equilibri- 
um exact /-value <10~ . Genotype calling was done using 
Illumina GenomeStudio 2010. The GWAS was conducted by 
PLINK vl.06, with the option of "model" to perform a Cochran- 
Armitage and a full-model case-control association test. 

Enrichment Analysis 

The GWAS data were investigated for SNPs which were 
annotated to a gene and located within the 5'UTR, 3'UTR, 
introns and exons of the gene, alternatively within a genomic 
region including up to 20 kb up- and downstream of a gene locus. 
Different aUelic Rvalue cut-offs (0.05, 0.01, 0.005, 0.001 and 
0.0001) were set to generate gene lists for both scenarios. If there 
was more than one SNP per gene meeting the selection criteria, 
the SNP with the lowest /-value was taken into account. Finally, 
four gene lists, two per scenario, with the allelic /(-value cut offs of 
0.01 and 0.005 were selected as input for six enrichment analysis 
tools (ConsensusPathDB, DAVID, FatiGO, GATHER, GeneCo- 
dis and WebGestalt) using the two main annotation databases 
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and 
Genomes (KEGG) as basis. These pathway enrichment tools were 
selected based on our previous experience on pathway enrichment 
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Table 3. Consistently enriched categories of the GWAS on BC survival. 





GO Annotations (6 tools) 


Number of 
tools 


Number of genes In 
category 


Number of GWAS genes* In category 


GO:0005509 


calcium ion binding 


3 


685 


67 


GO:0007155 


cell adhesion 


4 


958 


52 


KEGG Pathways (4 tools) 


KEGG 04520 


Adherens junction 


3 


59 


9 


KEGG 05412 


Arrhythmogenic right ventricular cardiomyopathy 


2 


65 


10 


KEGG 04360 


Axon guidance 


3 


80 


15 


KEGG 04020 


Calcium signaling pathway 


3 


139 


17 


KEGG 05414 


Dilated cardiomyopathy 


3 


82 


11 


KEGG 04512 


ECM-receptor interaction 


4 


57 


8 


KEGG 04510 


Focal adhesion 


3 


134 


15 


KEGG 00512 


O-Glycan biosynthesis 


2 


11 


7 


KEGG 05222 


Small cell lung cancer 


2 


65 


6 


Only categories enriched in all four gene lists and by more than one tool were considered consistently enriched. * Genes present in the 0.01 gene list (allelic SNP p-value 



cut-off 0.01). 
doi:1 0.1 371 /journal.pone.0098229.t003 



analyses [1 1]. The selection criteria included free availability, user- 
friendly handling, the usage of gene names as input and the GO 
and the KEGG database as basis, variations in stringency, test 
statistics and multiple comparison adjustment methods. The four 
gene lists were selected because they provided the enrichment tools 
with an applicable number of genes to run the enrichment 
analyses successfully. With a too stringent allelic SNP Rvalue cut- 
off, too few genes serve as input resulting in no significandy 
enriched categories. A too tolerant allelic p-value cut-off increases 
the background noise and may result either in too many unspecilic 
or no enriched categories [12,13]. For all tools the same conditions 
were applied, which were a significance threshold of 0.05 for the 
adjusted enrichment p-value, at least two genes from the input list 
in the enriched category and the whole genome as reference 
background. The goal was to identify the functional categories 
(GO terms and KEGG pathways) that are consistendy overrep- 
resented in a statistically significant way in the list of SNPs inferred 
from the GWAS on BC prognosis. The used tools and their 
characteristics can be seen in table S3. 

Consistently Enriched Categories 

For DAVID, FatiGO and GATHER, the tool's default Rvalue 
cut-off of 0.05 generated a list of 20-30 enriched categories for the 
comparison. However, for Consensus PathDB, GeneCodis and 
WebGestalt, this p-value cut-off generated lists of up to 176, 278 
and 77 enriched GO terms and we used the more stringent 
enrichment /(-value cut-offs of lxlO -6 , lxlO -4 and 0.01, 
respectively. The enriched categories of each allelic Rvalue cut- 
off gene list based on the SNPs within a gene region were 
compared to those of the gene list taking also the SNPs within the 
±20 kb spanning region into account (0.01 list vs. 0.01 ±20 kb list; 
0.005 list vs. 0.005±20 kb list). This was done for every tool 
separately. Then, the overlaps of the two different allelic Rvalue 
cut-offs were compared to each other. Finally, we compared the 
results of all tools to each other. Only categories enriched in all 
four gene lists and by more than one tool were considered 
consistently enriched (figure 1). 



Prognostic Gene Expression Signatures 

Literature was searched for commonly used prognostic gene 
signatures derived from breast cancer expression data. Twelve 
gene expression signatures were selected for further pathway 
enrichment analysis conducted by MetaCore GeneGo pathway 
enrichment analysis (false discovery rate (FDR) cut-off 0.05) 
because this tool enables the pathway enrichment analysis of two 
gene lists simultaneously and compares the results to each other. 
Also the 0.01 gene list of our GWAS was analyzed again with this 
tool to make the results comparable to the ones of the gene 
expression signatures. 

Results 

Systematic Enrichment Analyses of the GWAS Data 

The consecutive steps of the pathway enrichment analysis are 
summarized in figure 1 . The GWAS data were filtered for SNPs 
located within a gene (5'UTR, 3'UTR, intron and exon), as well 
as for SNPs located in a genomic region 20 kb up- and 
downstream from a gene locus to take also genetic effects in 
regulatory regions into account. Five different p-value cut-offs for 
both scenarios were set to generate gene lists (table 1). The gene 
lists based on SNPs created by the p-value cut-offs 0.01 and 0.005, 
consisting of 737 and 1143 genes and 402 and 638 genes, 
respectively, provided the enrichment tools with an applicable 
number of genes to run the enrichment analyses successfully. 

These four gene lists served as input in six different pathway 
enrichment analysis tools under the same setting. The number of 
enriched GO terms/KEGG pathways differed enormously 
between the enrichment analysis tools due to the individual tool 
features although the same analysis conditions were assigned 
(table 2). The ConsensusPathDB and GeneCodis tool reported in 
general much more enriched GO terms than the other used tools. 
For example, they generated 176 and 278 overrepresented GO 
annotations, respectively, when the 0.01±20kb gene list was 
analyzed. As comparison, DAVID and FatiGO reported only 4 
enriched categories each. 

To reduce the number of enriched categories, the results of the 
two gene lists with an allelic p-value cut-off of 0.005 were 
compared with each other. This was also done for the two gene 
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Muscle contraction_GPCRs in the regulation ofsmooth muscle tone 
Cytoskeleton remodeling_Role of PKA in cytoskeleton reorganisation 
Muscle contraction_ACM regulation of smooth muscle contraction 
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O-glycan biosynthesis 
O-glycan biosynthesis / Human version 
Development_Beta-adrenergic receptors signaling via cAMP 
Neurophysiological process_Nethn-1 in regulation of axon guidance 
Neurophysiological process_Receptor-mediated axon growth repulsion 
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Neurophysiological process_ACM regulation of nerve impulse 
Cell adhesion_Alpha-4 integrins in cell migration and adhesion 
Signal transduction_Activatjon of PKC via G-Protein coupled receptor 
Cytoskeleton remodeling_Regulation of actin cytoskeleton byRho GTPases 
Signal transductioncAMP signaling 
Neurophysiological process_Delta-type opioid receptor in the nervous system 
Apoptosis and survival_BAD phosphorylation 
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Figure 2. Top 25 GeneGO pathways enriched by the 0.01 gene list derived from the GWAS data, red numbers = significant at FDR of 0.05. 
doi:1 0.1 371 /journal.pone.0098229.g002 



lists with a. p-vahie cut-off of 0.01 and the resulting overlaps were 
compared with each other. This was done separately for every 
tool. To define consistendy enriched categories, the categories had 
to be overrepresented by at least two different tools. After this 
comparison eleven categories remained: two GO terms, which 
were "calcium ion binding" and "cell adhesion" and nine KEGG 
pathways named "adherens junction", "arrythmogenic right 
ventricular cardiomyopathy", "axon guidance", "calcium signal- 
ing", "dilated cardiomyopathy", "ECM-receptor interaction", 
"focal adhesion", "O-glycan-biosynthesis" and "small cell lung 
cancer" (table 3). Most categories were reported by three or four 
tools. 

We compared the genes of every category to each other to 
detect overlaps of the pathways to define the consistendy enriched 
categories (table 4). The cross-tabulation revealed a strong 
association of "cell adhesion" genes with all pathways except for 
the genes in "calcium signaling" and "O-glycan biosynthesis". 
Moreover, we investigated the overlap of our GWAS genes in the 
pathways, resulting in a similar outcome (table 5). Based on this 
analysis most categories were summarized in two overarching 
categories: 

1. "Cell adhesion" with its 52 GWAS genes combined different 
kinds of cell adhesion processes, such as the KEGG pathways 
"adherens junction", "ECM-receptor interaction", "focal 



adhesion", as well as "small cell lung cancer" and to a lesser 
extent "axon guidance". 

2. "Calcium ion binding" characterizes the group of KEGG 
pathways "arrythmogenic right ventricular cardiomyopathy", 
"calcium signaling", "dilated cardiomyopathy" and "O-glycan 
biosynthesis". 

Additionally, the gene overlap of 20-30% between the two GO 
terms "calcium ion binding" and "cell adhesion" supports a 
connection of these two annotations. 

Enrichment Analysis of the 0.01 Gene List with MetaCore 

As we wanted to compare our GWAS pathways with prognostic 
expression signatures, the longer 0.01 gene list was further 
analyzed by GeneGo, the pathway enrichment analysis tool of 
MetaCore, which allows simultaneous analysis and comparison of 
two gene lists. Fifteen pathways passed the significance level 
defined by a FDR of 0.05 and the analysis confirmed the 
importance of cell adhesion, axon guidance and calcium signaling 
(figure 2) in the GWAS survival signature. Although the GeneGo 
pathway enrichment analysis uses its own pathway terms, they are 
similar to the GO terms or KEGG pathways. Also the O-Glycan 
biosynthesis was found in the top 5 enriched pathways. The five 
most common terms, cell adhesion, cytoskeleton remodeling, 
development, muscle contraction and neurophysiological process, 
constituted 56% of the top 50 pathways (table 6, table S4). 
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Table 7. Prognostic gene expression signatures selected for pathway enrichment analysis. 



Signature 
name 


Author 


Year of 
publication 


No. of 
genes 


Study design 


Outcome 




Mammaprint 


Van't Veer 
ef al. [14] 


2002 


70 


78 patients with sporadic primary breast tumors: 
<5 cm, NO, age <55 years; 34 patients developed 
distant metastasis <5 years vs. 44 patients: 
disease-free >5 years 


Prognosis for distant 
metastasis 




Oncotype DX 


Paik et al. 
[15] 


2004 


21 


668 tumors from patients: NO and ER+, 
treated with tamoxifen 


Prognosis for distant 
recurrence/overall survival 




MapQuant 


Sotirou et al. 
[16] 


2006 


97 


64 samples: ER+, grade 1 vs. grade 3 


Prognosis for recurrence/ 
relapse-free survival 




Gene Search 


Wang et al. 

n 71 
L l /J 


2005 


76 


115 tumors: all NO; 80 samples ER+, 
35 ER-, analyzed separately for 
distant tumor recurrence, 
then combined 


Prognosis for distant 
tumor recurrence 




Wound response 
signature 


Chang et al. 
L loj 


2004 


512 


50 fibroblast culturesfrom 10 
anatomic sites: response of 
fibroblast to serum exposure 


Prognosis for 
metastasis/survival 




Lung metastasis 
signature 


Minn ef al. 


2005 


54 


Comparison of highly and weakly 
lung-metastatic cell populations 
derived from the breast cancer 
cell line MDA-MB-231 


Prognosis for lung 
metastasis 




Brain metastasis 
signature 


dos et at. 
[20] 


2009 


243 


Comparison of cell lines with 
different metastatic potentials 
derived from the breast cancer 
cell lines MDA-MB-231 and CN34 


Prognosis for brain 
metastasis 




Bone metastasis 
signature 


Kang et al. 
[21] 


2003 


1 02 


f\/ir^A J\ A □ 131 krn^rt r-\nrA^ /-nil linn 

iviuA-Jvib-z3 i Dreasi cancer cen line 
+12 derivative subpopulations with 
different metastatic potentials 


Prognosis for bone 
metastasis 




Invasiveness 
signature 


Liu et al. 

[22] 


2007 


1 86 


CD44+CD24 - /low breast cancer cells 
with high tumorgenic capacity vs. cells 
of normal breast epithelium 


Prognosis for overall/ 
metastasis-free survival 




General metastasis Ramaswamy 
signature et al. [23] 


2003 


128 


64 primary adenocarcinomas of diverse 
origin (lung, breast, prostate, 
colorectal, uterus, ovary) vs. 12 
unmatched adenocarcinoma metastasis 


Metastatic potential, 
clinical outcome 




Meta gene 
signature 


Gyorffy 
et al. [24] 


2009 


376 


Meta-analysis of 20 published gene 
signatures on 7 breast cancer 
microarray data sets (n = 1079) 


Prognosis for 
relapse-free survival 




374 Gene 

Set/consensus 

genes 


Lauss 
et al. [25] 


2008 


374 


Meta-analysis of 44 published gene 
signatures on 8 breast 
cancer microarray data 
sets (n = 1067) 


Prognosis for 
survival 
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Enrichment Analyses of the Gene Expression Signatures 

Literature was searched for commonly used prognostic gene 
signatures derived from breast cancer expression data. We selected 
twelve signatures for further pathway enrichment analysis (table 7). 
Mammaprint [14], Oncotype DX [15], MapQuant [16], Gene 
Search [17] and the fibroblast core serum response (CSR) 
signature, commonly known as wound response signature [18], 
are well established, often cited in literature and commercially 
available. We also included five gene signatures based on 
expression data of metastatic breast cancer or metastatic 
adenocarcinomas of diverse origin [19-23], because metastasis 
formation has a profound impact on patients' survival. Last, we 
added two prognostic gene signatures based on meta-analyses of 
published gene expression signatures and microarray data sets of 
breast tumors [24,25] to evaluate how a combination of several 
prognostic gene signatures influences the enrichment analysis and 
if this result is comparable to the one obtained by the GWAS. 

The signatures could be divided in four subgroups. The 
commercially available gene signatures Mammaprint, Oncotype 



DX, MapQuant and Gene Search were dominated by the terms 
cell cycle and development (table 6). Also the two meta-analyses 
showed enrichment of genes involved in cell cycle (34% of the 50 
top pathways each), as did the general metastasis signature (16%). 
The specific lung, brain and bone metastasis signatures showed a 
strong connection to the generic terms immune response and 
development, which represented about 30% and 20% of the 
enriched pathways, respectively, and they were lacking pathways 
associated with cell cycle. The wound response and invasiveness 
signature did not show any specific pattern. 

Comparison of the GWAS and the Gene Expression 
Signatures 

In order to evaluate the similarities between the GWAS and the 
gene expression signatures, we analyzed the GWAS gene signature 
and every prognostic gene expression signature simultaneously 
with the MetaCore GeneGo pathway enrichment analysis tool to 
get a detailed view on their common pathways (figure 3, Figures 
S 1 — S 11). In this analysis the two gene lists were investigated for 
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Figure 3. Top 25 GeneGO pathways enriched simultaneously by the 0.01 gene list and general metastasis signature; red 
numbers = significant at FDR of 0.05; green box = pathway significantly enriched by both gene lists; Pathway "Airway smooth 
muscle contraction in asthma" was placed at rank 7, pathway "Cytoskeleton remodeling_Cytoskeleton remodeling" was placed at 
rank 9. 

doi:10.1371/journal.pone.0098229.g003 



overrepresented pathways and compared to each other. Only 
pathways enriched by both gene lists are displayed and ranked by 
their enrichment /?-values. The pathways which were significandy 
enriched by both gene signatures at the same time were a rare 
event. In all simultaneous analyses, only three pathways passed the 
0.05 FDR significance level in both analyzed gene lists contem- 
poraneously. Two of them were enriched in the analysis of our 
GWAS gene list together with the general metastasis signature 
(figure 3). These were the "Airway smooth muscle contraction in 
asthma" pathway placed at rank 7 (P 0 01 gene liat = 5. 3xl0~ 3 : 
Pgensrai metastasis signature = 7 . 9 x 1 0 " 4 ) and the "Cytoskelton remo- 
deling_Cytoskelton remodeling" pathway (Po.oi gene 
list = 1-7x10 ; Pgenerai metastasis signature = 9-7 x 10 ) placed at rank 
9. 

"Airway smooth muscle contraction in asthma" (figure 4) is 
almost identical to the top pathway in this analysis, "Muscle 
contraction_GPCRs in the regulation of smooth muscle tone" 
(figure SI 2), showing a clear connection to calcium ion binding, 
with the Ca 2+ -ions containing endoplasmatic reticulum and the 
associated proteins as one central part of these pathways. Several 
proteins of these pathways can also be found in the GeneGo 
pathway "Cytoskeleton remodeling_Cytoskeleton remodeling" 
(figure 5). This pathway combines several sub-pathways, many 
of them involved in cell adhesion. These include the pathways 
"ECM-receptor interaction", "focal adhesion" and "adherens 
junction". Also links to the well-known cancer pathways "TGF-p 1 
signaling" and "Wnt signaling" are observed. "Cytoskeleton 
remodeling_Cytoskeleton remodeling" was also significandy en- 
riched by the brain metastasis gene signature (Pbrain metastasis 
signature = 2.5 xlO - ' 5 ) and placed at rank 15 (figure S7). 

The third pathway significandy overrepresented by the 0.01 
gene list and a gene expression signature in the simultaneous 
analyses was "Neurophysiological process_Recep tor-mediated 



axon growth repulsion" (figure 6), which was significantly enriched 
in the analysis together with the Oncotype DX signature and 
placed at rank 4 (P 0 . 01 gene list = 9. 1 x 1 0" : '; P 0 „eotype dx = 6.4 x 10" 5 ) 
(figure S2). Also this pathway has a connection to the calcium 
signaling and cell adhesion pathway. 

Discussion 

The aim of our study was to set data derived from a GWAS on 
breast cancer survival into a global context by using a systematic 
pathway enrichment analysis with the two independent databases 
GO and KEGG as basis. In this process, the GO database was 
searched for overrepresented terms on a higher level of 
abstraction. A more detailed and focused view was achieved by 
using the data of the KEGG. By this way we gained eleven 
consistently enriched categories, two more general GO terms and 
nine specific KEGG pathways, which may have an influence on 
BC survival. A gene overlap of up to 87% between six of these 
categories revealed a strong connection to cell adhesion and 
included the KEGG pathways "adherens junction", "axon 
guidance" "ECM-receptor interaction", "focal adhesion", and 
"small cell lung cancer" and the GO term "cell adhesion". The 
second category with a high proportion of overlapping genes 
involved in calcium ion binding included the GO term "calcium 
ion binding" and the KEGG pathways "arrythmogenic right 
ventricular cardiomyopathy", "calcium signaling", "dilated car- 
diomyopathy" and "O-glycan biosynthesis". There was also an 
overlap of 20-30% between the two overarching categories, which 
emphasizes the interplay of cellular adhesion processes and 
calcium signaling as an important process in breast cancer 
survival. 

In the second part of our study we compared the pathway 
enrichment results of our GWAS data to those of twelve 
prognostic gene expression signatures. Simultaneously conducted 
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Figure 4. GeneGo pathway "Airway smooth muscle contraction in asthma". Barometers: 1 = 0.01 gene list; 2 = general metastasis signature, 
red = Calcium signaling pathway, blue = Smooth muscle contraction/relaxation. 
doi:1 0.1 371 /journal.pone.0098229.g004 



pathway enrichment analyses with each of the expression 
signatures revealed that the "Airway smooth muscle contraction 
in asthma", the "Cytoskeleton remodeling" and the "Neurophys- 
iological process_Receptor-mediated axon growth repulsion" 
pathways were the only ones which were significantly overrepre- 
sented by both the GWAS and a gene expression signature. The 
gene expression signatures involved were the general metastasis 
signature with two simultaneously enriched pathways and the 
specific brain metastasis signature and Oncotype DX each with 
one simultaneously enriched pathway, respectively. The general 
metastasis signature was derived from a comparison of gene 
expression data of adenocarcinomas of diverse origin (lung, breast, 



prostate, colorectal, uterus, ovary) with the corresponding 
metastases leading to 128 genes that distinguished best between 
primary tumors and metastases [23]. The brain metastasis 
signature is based on a genome-wide expression analysis of two 
BC cell lines and their highly brain metastatic cell derivates [20]. 
The comparison of the gene expression profiles led to 243 
differentially expressed genes, which were used as a brain 
metastasis signature in our study. The Oncotype DX signature 
was generated by a hypothesis driven search of the literature and 
databases for candidate cancer genes, which were tested for their 
correlation with disease recurrence in three independent breast 
cancer studies [15]. The sixteen best performing genes and five 
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Figure 5. GeneGo pathway "Cytoskeleton remodeling". Barometer: 1 =0.01 gene list; 2 = general metastasis signature; 3 = brain metastasis 
signature, orange = ECM-receptor interaction, purple = Adherens junction pathway, red = Calcium signaling pathway, pink = Focal adhesion pathway, 
yellow = TGF-p signaling pathway, green = Wnt signaling pathway, blue = Smooth muscle contraction/relaxation. 
doi:1 0.1 371 /journal.pone.0098229.g005 



reference genes were used to calculate a recurrence score. These 
three gene expression signatures are based on genes which are 
already known for their involvement in cancer (Oncotype DX) or 
which are associated with the metastasis forming process (general 
and brain metastasis signature). Together, the three simultaneous- 
ly enriched pathways picture well a possible interaction of the cell 
adhesion pathways with the calcium signaling pathway in the 



metastatic process and the patients' survival probabilities 
(figures 4—6). 

Calcium signaling and cell adhesion interact in various ways 
with each other and play an important role in metastasis, which 
involves detachment from the solid primary tumor, migration and 
invasion in a foreign tissue [26]. For example, E-cadherin as a key 
cell-to-cell adhesion molecule, essentially requires Ca + -ions to 
form homophilic interactions between two neighboring cells in 
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Figure 6. GeneGo pathway "Neurophysiological process Receptor-mediated axon growth repulsion". Barometers: 1 =0.01 gene list; 
2 = Oncotype DX. red = Calcium signaling pathway. 
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adherens junction [27]. Its down-regulation or inactivation in 
carcinomas has been reported to result in reduced cell adhesion 
[28,29] making it as a major suppressor of metastasis. Also focal 
adhesions, as the main linkage point between the cells and the 
extra cellular matrix (ECM), are influenced by calcium. Focal 
adhesion turnover, which determines the efficiency of cell 
migration, is regulated by calcium signaling. An important 
component in this process is focal adhesion kinase (FAK), which 
is a contact point for diverse extracellular stimuli, including Ca 2+ - 
concentration. FAK coordinates signals between integrins, the 
attachment molecules to the ECM, and growth factor receptors 
and promotes cell migration [30]. These examples point to the 



regulation of the metastasis formation either directly through 
mutations in the involved adhesion molecules or indirectly through 
impaired "calcium signaling". 

Metastases are the leading cause of death of cancer patients and 
therefore strongly connected to patients' survival. This was also 
reflected in our study population: short-time survivors tended to 
have tumors with higher stage than long-time survivors. As our 
data is based on a GWAS on BC survival comparing women with 
short-time survival to those with long-time survival, the results of 
our pathway enrichment analyses reflect the impact of the invasive 
tumor phenotype on the survival of a patient. Moreover, the 
comparison analyses with the pathway enrichment results of 
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commonly used prognostic gene expression signatures support our 
conclusion. 

Although pathway enrichment tools are able to put the GWAS 
data into a global context, there are some points which need to be 
considered [12,31]. Large genes with more SNPs are more likely 
to contain associated SNPs by chance alone than small genes. To 
avoid this bias, we annotated the SNPs to a gene both by excluding 
and including a 20 kb up- and downstream sequence of the gene. 
Only the best SNP (i.e. the one with the lowest Rvalue) per gene 
was included in the analysis. The 20 kb limit was applied because 
the average length of haplotype blocks in the CEU population 
ranges between 5.9 kb (calculation method based on the four 
gamete test) and 16.3 kb (calculation method based on a 
composite of local D' values) [32]. The pathway enrichment tools 
themselves also suffer from some limitations, which we experi- 
enced in our study. Even though the conditions were identical in 
all analyses, the different tools showed large variability in the 
number of overrepresented categories and their corresponding p- 
values [33]. The reasons for this variation include the source and 
the version of the annotation files, the annotation level used by the 
tool, the statistical model applied for the enrichment analysis, the 
correction for multiple testing, and the background gene set, which 
is used to calculate the ^-values for the overrepresented pathways 
[34]. One way to avoid the problem of inconsistent results 
obtained by different tools is to use several tools and to compare 
the results with each other. In our study, we analyzed four gene 
lists derived from the GWAS on BC survival with six tools and 
compared the results to detect true, consistendy enriched 
categories. 

In conclusion, our pathway enrichment analysis of the high- 
throughput data from a GWAS on BC survival revealed an 
influence of cell adhesion and calcium signaling on BC patients' 
survival. This was also confirmed by our comparison to the 
enrichment analyses of twelve prognostic gene expression signa- 
tures. The known high impact of metastasis on a patients' survival 
is supported by our genetic data, which also highlights the 
influence of changes in cell adhesion and calcium signaling in the 
metastatic process. Therefore, a further investigation of the 
identified pathways and the defined mechanisms of metastasis is 
a promising target to get classifiers for the patients' survival. 

Supporting Information 

Figure SI Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and Mammaprint. red 

numbers = significant at FDR of 0.05. 
(TIF) 

Figure S2 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and Oncotype DX. red 

numbers = significant at FDR of 0.05; green box = "Neurophysio- 
logical process_Receptor mediated axon growth repulsion" 
pathway significandy enriched by both gene lists at rank 4. 
(TIF) 

Figure S3 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and MapQuant. red 

numbers = significant at FDR of 0.05. 
(TIF) 

Figure S4 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and Gene Search, red 

numbers = significant at FDR of 0.05. 
(TIF) 

Figure S5 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and wound response 



signature, red numbers = significant at FDR of 0.05; green 

box = pathway significantly enriched by both gene lists. 

(TIF) 

Figure S6 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and the lung metastasis 
signature, red numbers = significant at FDR of 0.05. 
(TIF) 

Figure S7 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and the brain metastasis 
signature, red numbers = significant at FDR of 0.05; green 
box = "Cytoskeleton remodeling_ Cytoskeleton remodeling" path- 
way, significantly enriched by both gene lists at rank 15. 
(TIF) 

Figure S8 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and the bone metastasis 
signature, red numbers = significant at FDR of 0.05. 

(TIF) 

Figure S9 Top 25 GeneGO pathways enriched simulta- 
neously by the 0.01 gene list and the invasiveness 
signature, red numbers = significant at FDR of 0.05. 

(TIF) 

Figure S10 Top 25 GeneGO pathways enriched simul- 
taneously by the 0.01 gene list and the meta gene 
signature, red numbers = significant at FDR of 0.05. 

(TIF) 

Figure Sll Top 25 GeneGO pathways enriched simul- 
taneously by the 0.01 gene list and the consensus gene 
signature, red numbers = significant at FDR of 0.05. 
(TIF) 

Figure SI 2 GeneGo pathway "Muscle contrac- 
tion_GPCRs in the regulation of smooth muscle tone". 

Barometers: 1=0.01 gene list; 2 = general metastasis signature; 

red = Calcium signaling pathway. 

(TIF) 

Table SI Detailed characteristics of the whole study 
population. 

(DOCX) 

Table S2 General characteristics of sub-populations 
used in the GWAS. 

(DOCX) 

Table S3 Used pathway enrichment tools and their 
features. 

(DOCX) 

Table S4 Top 50 GeneGo pathways enriched by the 0.01 
gene list. 

(DOCX) 
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